Model Selection

Document Image Parsing

# Document Image Parsing

VL3 SigLIP NaViT

The visual encoder for VideoLLaMA3, utilizing Arbitrary Resolution Visual Tokenization (AVT) technology to dynamically process images and videos of different resolutions.

Transformers English

Document image understanding model fine-tuned based on naver-clova-ix/donut-base-finetuned-cord-v2

Donut Base Medical Handwritten Blocks Data Extraction

A model based on the Donut architecture, specifically designed for extracting structured data from medical handwritten documents

Text Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase